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Abstract 

Building robust letter-to-sound correspondences is a prerequisite for reading, and such 
audiovisual integration becomes progressively automatic with development. However, 
the neural mechanisms underlying the development of audiovisual integration for 
reading are largely unknown. This study used functional magnetic resonance imaging 
(fMRI) in a lexical decision task to investigate the changes of brain functional 
networks that support audiovisual integration for reading between normally 
developing children (9-12 years old) and adults (20-28 years old). The identified 
networks were further examined in children with developmental dyslexia (9-12 years 
old). Results revealed that adults enhanced connectivity in a prefrontal-superior 
temporal network relative to children, reflecting the attentional modulation to the 
development of audiovisual integration. Moreover, this network was disrupted in 
dyslexics, confirming its essential role in audiovisual integration for reading. This 
study, for the first time, elucidates the neural basis underlying the development of 


audiovisual integration for reading. 
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Introduction 

Establishing reliable and robust associations between visual and auditory information 
is the foundation of reading acquisition and development (Blau et al., 2010; Blau, van 
Atteveldt, Ekkebus, Goebel, & Blomert, 2009; Holloway, van Atteveldt, Blomert, & 
Ansari, 2015; van Atteveldt, Formisano, Goebel, & Blomert, 2004). Dysfunction in 
integrating orthographic and phonological information into a unified audiovisual 
percept has been identified as a vital factor of reading failure in alphabetic languages 
(Blau et al., 2010; van Atteveldt et al., 2004) and logographic languages (Yang, Yang, 
Li, Xu, & Bi, 2020). Congruent phonological information and visual scripts 
complement each other, thus improving the accuracy and responsiveness of visual 


word recognition (Kast, Bezzola, Jancke, & Meyer, 2011; Raij, Uutela, & Hari, 2000). 


At the neural level, numerous neuroimaging studies using different task paradigms 
have identified several brain regions involved in audiovisual integration for reading. 
For example, by comparing brain activation between the audiovisual responses and 
the summations of unisensory responses, Raij et al. (2000) and van Atteveldt et al. 
(2004) found that the bilateral superior temporal gyrus/superior temporal sulcus 
(STG/STS), the left frontoparietal region and the right frontal cortex were engaged in 
the integration of letters and speech sounds in skilled adult readers (Raij et al., 2000; 
van Atteveldt et al., 2004). In addition, several studies have examined the brain 
substrates of audiovisual integration by the congruency effect, which is based on the 


comparison of brain responses between congruent and incongruent audiovisual 
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stimulus pairs (Blau, van Atteveldt, Formisano, Goebel, & Blomert, 2008; W. Xu, 
Kolozsvári, Oostenveld, Leppänen, & Hämäläinen, 2019). The studies of adult 
readers found that the bilateral STG/STS, Heschl sulcus/planum temporale, 
middle/inferior temporal gyrus (MTG/ITG), middle/inferior frontal gyrus (MFG/IFG), 
cingulate gyrus (CG), superior parietal lobule (SPL) and fusiform gyrus (FuG) were 
engaged in audiovisual integration processing (Blau et al., 2008; Holloway et al., 
2015; van Atteveldt, Blau, Blomert, & Goebel, 2010; van Atteveldt, Formisano, 
Blomert, & Goebel, 2006; W. Xu et al., 2019). In short, brain activation in the 
superior temporal cortex and frontal cortex were consistently observed to support the 


audiovisual integration in reading processing. 


With practice and reading development, audiovisual integration processing becomes 
gradually automatic and optimal (Beierholm & Adams, 2016; Froyen, Bonte, van 
Atteveldt, & Blomert, 2009). A study of Dutch children showed that the processing 
time for letter-speech sound associations steadily decreased over the full range of 
primary school grades, despite early acquisition of associations between orthography 
and phonology (Blomert & Vaessen, 2009), suggesting an ongoing development 
towards automatic processing. Furthermore, Froyen et al. (2009) found that skilled 
adult readers, but not children (10-12 years), showed enhanced mismatch negativity 
(MMN) amplitude of letter-speech sound presented simultaneously than presented 
separately (Froyen et al., 2009), suggesting the differences in neural activity of 


audiovisual integration in reading between children and adults. To the best of our 
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knowledge, only an early study examined the differences in effective connectivity of 
audiovisual integration between children and adults (Dick, Solodkin, & Small, 2010), 
but that study examined the audiovisual integration in the context of speech 
comprehension. Consequently, it is unknown about the differences in the brain 
activity of audiovisual integration for visual word recognition (reading) between 


children and adults. 


Recent evidence shows that audiovisual integration requires interplay between 
distributed regions (Calvert, 2001; Driver & Noesselt, 2008; Paraskevopoulos, 
Kraneburg, Herholz, Bamidis, & Pantev, 2015). Specifically, audiovisual integration 
recruits high-level cognitive processes (e.g. attention and semantic processing) 
resulting in its late development (Barutchu et al., 2010; McNorgan, Randazzo-Wagner, 
& Booth, 2013; Murray & Wallace, 2012; Talsma, Senkowski, Soto-Faraco, & 
Woldorff, 2010), and thus regions involved in higher order processing might interact 
with regions involved in audiovisual integration. In this context, a large-scale 
functional network analysis may be a more informative method to understand the 
brain organization underlying audiovisual integration in reading and its development 
from childhood to adulthood. Functional networks are typically modeled as graphs 
composed of nodes (the cortical regions contributing to a network) and edges (the 
connections between nodes) (Meunier, Achard, Morcom, & Bullmore, 2009; 
Paraskevopoulos et al., 2015). Previous studies have successfully applied network 


analysis method to unveil the neurodevelopment of functional networks for reading 
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(X. Liu et al., 2018) and for expressive language ability (Doesburg, Tingling, 


MacDonald, & Pang, 2016). 


Using functional network analysis, the present study aimed to unveil the changes in 
neural mechanisms underlying the audiovisual integration for reading between 
children and adults. First, we compared the brain networks of audiovisual integration 
between normally developing child readers (9-12 years old) and skilled adult readers 
(20-28 years old). Following previous studies (Kast et al., 2011; Yang et al., 2020), a 
lexical decision task was used to examine audiovisual integration in a real reading 
context. Participants were asked to decide whether the visual symbols presented 
simultaneously with congruent or incongruent speech sounds were real Chinese 
characters. The congruency effect was adopted as the index of audiovisual integration 
effect (Blau et al., 2010; Blau et al., 2009). Our hypothesis was that compared to 
children, skilled adult readers would show greater functional connectivity in a 
widespread network involving the core regions of audiovisual integration (such as 
STG, MFG and IFG) and the higher-order association cortices (such as prefrontal and 


parietal cortices). 


Afterwards, the identified functional networks that differed between the two age 
groups were explored in a sample of children with developmental dyslexia. The 
rationale was that if the abovementioned functional networks are critical to 


audiovisual integration development for reading, we would expect to observe the 
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disruption of these functional networks in individuals with dyslexia. 


Materials and Methods 

Participants 

Twenty-five normally developing children (9 females, mean age = 11.45 + 0.83 years, 
abbreviated as CH), twenty-one adults (11 females, mean age = 23.85 + 2.61 years, 
abbreviated as AD) and fourteen children with dyslexia (4 females, mean age =10.99 
+ 1.03 years, abbreviated as DD) participated in this study. All participants were 
native Mandarin Chinese speakers, and were right-handed assessed by the 
Handedness Inventory (Department of Neurology, Beijing Medical University 
Hospital). All participants had normal hearing, normal or corrected-to-normal vision 


and were not suffered from ophthalmological or neurological abnormalities. 


The sample size of CH and AD was determined a priori using G*Power (Version 3.1, 
http://www.gpower.hhu.de/) (Faul, Erdfelder, Lang, & Buchner, 2007), which 
indicated that a total of 34 participants were required for a medium partial 7” of 0.06 
(effect size f = 0.25) and a power of 0.8 with an alpha of 0.05 (Mumford, 2012; 
Simonet, Roten, Spierer, & Barral, 2019). In order to compensate for potential 
exclusion of participants (e.g. excessive head motion), we recruited more than 20 


participants for each group. 


The dyslexic participants were from one published study (Yang et al., 2020). The 
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screening criteria includes: 1) having a reading score at least one and a half standard 
deviations below the average score of children in the same grade assessed by the 
Character Recognition Measures and Assessment Scale (CRM) (X. L. Wang & Tao, 
1996); 2) having normal score (above 85) of the standard score of non-verbal 
intelligence quotient (IQ) evaluated by Combined Raven’s Progressive Matrices (CRT) 
(Li, Chen, & Jin, 1989); and 3) not suffering from ADHD assessed by the Chinese 


Classification of Mental Disorder 3 (CCMD-3). 


The study was approved by the ethics committee of the Institute of Psychology, 
Chinese Academy of Sciences. Each adult participant and child participant’s guardian 
gave written informed consents prior to this study. Demographic information and the 


results of screening tests are shown in Table 1. 


Linguistic-cognitive tests 

All participants were administered three linguistic-cognitive tests measuring reading 
accuracy, reading fluency and phonological awareness, respectively. The reading 
accuracy test consists of 172 Chinese characters with varying word frequency. 
Participants were required to name overtly all the characters as accurately as possible, 
with no time limit. The reading fluency test consisted of 160 high and medium 
frequency Chinese characters. Participants were asked to read these characters aloud 
as quickly and accurately as possible within one minute. In both tests, one point was 


awarded for each character that was read correctly. In the phonological awareness test, 


1 participants heard three syllables, one of which was different from the others in 
2 consonant, vowel or tone (10 items for each type). Participants were asked to select 
3 the syllable that differed from the others. One point was given for each correct 
4 judgment. Three CH, six AD and two DD did not participate in the reading tests, so 
5 their scores on the linguistic-cognitive tests were missing. The test scores of the 


6 remaining participants are presented in Table 1. 


8 Table 1 Demographic information and performance in the reading tests for all three 
9 groups of participants 


p-values (t-tests or y-tests) 


CH (n=25) AD (n=21) DD (n=14) 
CH vs. AD CH vs. DD 
Age 11.45 (0.83) 23.85 (2.61) 10.99 (1.03) < 0.001 0.139 
Male/Female 16/9 10/11 10/4 0.264 0.637 
Raven IQ 119.68 (15.72) ~- 106.71 (14.88) ~ 0.016 
Reading score 2849 (260) = 2007 (293) -= < 0.001 
Reading accuracy 104.73 (11.37) 147.47 (8.01) 84.75 (17.63) < 0.001 < 0.001 
Reading fluency 105.23 (15.06) 141.13 (18.00) 69.08 (21.35) < 0.001 < 0.001 
Phonological awareness 24.55 (0.86) 24.80 (0.80) 15.17 (1.51) 0.838 < 0.001 


10 Notes: Data are presented as the mean (standard deviation). CH = normally developing children, AD = 
11 adults and DD = children with dyslexia. 
12 


13 Stimuli and task design 
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The visual stimuli were 30 high-frequency Chinese characters and 30 
pseudocharacters. All the real characters were compound characters composed of a 
phonetic radical and a semantic radical. The pseudocharacters were created by 
combining a phonetic and a semantic radical together in their legal positions in 
Chinese orthography, but were unpronounceable and nonsensical. The visual 
complexity (stroke number and frequency of radicals) of real characters and 
pseudocharacters was matched (see Supplementary Materials Table S1). The auditory 
stimuli consisted of 120 pronunciations of Chinese characters recorded by a female 
native speaker. Stimuli were assigned to five experimental conditions: the condition of 
audiovisually congruent characters (AVcon) in which a real character and its sound 
were presented simultaneously; the condition of audiovisually incongruent characters 
(AVincon) in which a real character and a incongruent sound were presented; the 
condition of audiovisually pseudocharacters (AVpseudo) in which a pseudocharacter 
appeared with a sound of a real character; the condition of visual real characters 
(Vreal) and visual pseudocharacters (Vpseudo) in which a real 


character/pseudocharacter was presented visually in isolation. 


An event-related design was adopted for fMRI scan. Each participant underwent two 
runs. Each run included 15 trials of AVcon, 15 trials of AVincon, 30 trials of 
AVpseudo, 15 trials of Vreal, 15 trials of Vpseudo, together with 47 null trials, for a 
total of 137 trails, presented in pseudo-random order. In each task trial, a fixation was 


first presented in the center of the screen for 500 ms, followed by the presentation of 
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stimuli for 1200 ms and a blank screen for 800 ms. The visual stimuli presented 1200 
ms and the auditory stimuli presented simultaneously with visual stimuli (in the 
bimodal conditions), lasting from 185 to 509 ms. Each null trial consisted of 500 ms 
of fixation and 2000 ms of a blank screen. Following previous studies (Kast et al., 
2011; Yang et al., 2020), participants were instructed to complete a lexical decision 
task, in which they were required to attend to the visual stimuli and determine 


whether they were real Chinese characters or not by pressing buttons. 


Image acquisition 


fit scanner at the 


All participants underwent scanning at a 3T MRI Siemens Prisma 
Beijing MRI Center for Brain Research of the Chinese Academy of Sciences. 
Functional MRI time series data were obtained using a BOLD-sensitive T2*- 
weighted gradient-echo echo planar imaging (EPI) sequence (32 slices, slice thickness 
= 3 mm with a 0.6-mm gap, in-plane resolution = 3 mm x 3 mm, flip angle = 90°, 
repetition time = 2500 ms and echo time = 30 ms). High spatial resolution anatomical 
images were acquired using a Tl-weighted, magnetization-prepared rapid acquisition 


gradient echo (MPRAGE) sequence (slice thickness = 1 mm, in-plane resolution = 1.0 


mm x 1.0 mm, flip angle = 8°, repetition time = 2600 ms and echo time = 3.02 ms,). 


fMRI data analysis and statistics 
Preprocessing 


Image preprocessing was conducted using SPM8 (http://www.fil.ion.ucl.ac.uk/spm/, 
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Wellcome Department of Cognitive Neurology, University College London, London). 
The fMRI time series data were firstly corrected for slice timing and head motion, and 
then normalized into Montreal Neurological Institute (MNI) stereotactic space with 
cubic voxels at 2 mm x 2 mm x 2 mm spatial resolution. Finally, the normalized 
functional images were smoothed with an isotropic Gaussian kernel with a 6 mm 
full-width at half-maximum. Seven CH and two DD are excluded from the following 
analysis because of excessive head motion during the scanning period (> 3 mm 
translation or > 3° rotation), thus the final sample size was 18 CH, 21 AD and 12 DD 


subjects. 


Functional network analysis 

Creation of functional connectivity matrices 

A total of 264 functional regions of 10 mm diameter spheres were selected as nodes 
based on an validated parcellation template (Power et al., 2011; Vatansever, Menon, 
Manktelow, Sahakian, & Stamatakis, 2015; Wagner et al., 2019). Functional 
connectivity (FC) matrices were created with the CONN Functional Connectivity 
Toolbox (Whitfield-Gabrieli & Nieto-Castanon, 2012). Specifically, the blood oxygen 
level dependent (BOLD) time series corresponding to the AVcon and AVincon 
conditions were first extracted and concatenated separately over trials. Nuisance 
BOLD signal fluctuations from cerebrospinal fluid and white matter were estimated 
and removed using the anatomical component correction (CompCor) strategy 


(Behzadi, Restom, Liau, & Liu, 2007). Head motion (6 motion parameters and 6 
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first-order temporal derivatives) as well as the main effect of task were also regressed 
out. The data were high-pass filtered at 0.008 Hz to preserve task-relevant 
high-frequency signals, which have been found to yield stronger and more reliable 
evidence of effects of age (Geerligs, Tsvetanov, Cam, & Henson, 2017; X. Liu et al., 
2018). Pearson’s correlation coefficients between each pair of regional time series 
were computed and transformed into Fisher’s z scores. Following this procedure, 
undirected and weighted 264 x 264 FC matrices were constructed for the AVcon and 


AVincon conditions for each participant (Vatansever et al., 2015). 


Network-based statistics 

The network-based statistic (NBS) approach was used to identify functional networks 
underlying the differences in audiovisual integration for reading between CH and AD 
(Zalesky, Cocchi, Fornito, Murray, & Bullmore, 2012; Zalesky, Fornito, & Bullmore, 
2010), which has been widely used in neurodevelopment research (Cignetti et al., 
2018; Doesburg et al., 2016; Grayson et al., 2014). In the present study, we proceeded 
to detect significant nonzero connections [false discovery rate (FDR) corrected p < 
0.05] in FC matrices for each group and condition by performing a one-sample t-test 


in GRETNA (http://www.nitrc.org/projects/gretna/) (J. Wang et al., 2015). Then, a 


binary matrix was created by performing a union of significant nonzero connections 
from the AVcon and AVincon FC matrices. The FC matrices masked by the ‘union’ 
binary matrix was inputted into the NBS to identify the significant audiovisual 


integration networks in the CH and AD groups, respectively. The ‘union’ binary 
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matrix was used as a mask in order to keep the same edges to be applied for the 
statistical comparisons (Jiang et al., 2013; Wagner et al., 2019). A less constrained 
primary threshold of p < 0.05 was used in the analysis to retain more functional 
connectivity information for the edges of functional networks underlying the neural 
differences between CH and AD. A set of supra-threshold connections were defined 
based on the primary threshold, which were used to determine topological 
components. A component (i.e. a subnetwork) is a connected graph, for which a path 
can be found between any two nodes. Following that, nonparametric permutation tests 
(5000 permutation, family-wise error rate (FWER) corrected p < 0.05) were 
performed to estimate the significance of each subnetwork based on their intensities 
(the sum of test statistic values across all connections). The corrected p value for a 
subnetwork of a given size was calculated as the proportion of permutations for which 
the largest component was the same size or greater. Hubs of the identified functional 
network for audiovisual integration were defined as those nodes whose strength was 
1.5 SD (standard deviation) greater than the mean strength across all nodes in the 
network (X. Liu et al., 2018). Node strength is analogous to node degree in weighted 
networks and is defined as the sum of edge weights (i.e. Fisher’s z scores) attached to 


a node (Fornito, Zalesky, & Bullmore, 2016; Paraskevopoulos et al., 2015). 


As a final step, a 2 (group: CH and AD) x 2 (audio-visual congruency: AVcon and 
AVincon) repeated measures analysis of covariance (ANCOVA) was conducted in 


NBS with FC matrices masked by a binary matrix creating by the union of the 
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significant networks involved in audiovisual integration in each group obtained in the 
previous step. Since the effect of head motion may confound between-group 
differences in functional connectivity (Geerligs et al., 2017; Siegel et al., 2017; Zeng 
et al., 2014), average frame wise displacement (FD) estimated based on the six head 
movement parameters (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012) was 
calculated and included as a covariate. In addition, since a previous study reported 
gender differences in audiovisual integration processing (Ross, Del Bene, Molholm, 
Frey, & Foxe, 2015), gender was also taken as a covariate to control its potential 
effects. A primary threshold of p < 0.01 was applied to ANCOVA. and subnetworks 
with a FWER-corrected p < 0.05 were retained (5000 permutations). The visualization 
of functional networks was performed using BrainNet Viewer (Xia, Wang, & He, 


2013). 


Brain-behavior correlation analyses 

In order to examine the relationship between reading ability and the functional 
networks that differed across groups, a correlation analysis was conducted between 
the functional networks (the congruency effect of connectivity strength) and reading 
performance (reading accuracy, reading fluency and phonological awareness) in CH 
and AD groups, respectively. In addition, to clarify the role of the functional networks 
during a lexical decision with congruent and incongruent stimuli, we performed a 
correlation analysis between the functional networks (the congruency effect of 


connectivity strength) and in-scanner behavior responses (accuracy and reaction time). 
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The significance level was set at p < 0.05 after FDR correction for multiple 


comparisons. 


Validation analysis 

To evaluate the robustness of the observed functional networks underpinning the 
differences in audiovisual integration between CH and AD, three validation 
procedures were performed: 1) using a more stringent primary threshold of p < 0.005; 
2) using another estimation method -- NBS extent (the total number of connections 
within a component) and ; 3) using an alternative 200 ROI atlas created by Craddock 


et al. (Craddock, James, Holtzheimer, Hu, & Mayberg, 2012). 


Functional network analysis in children with developmental dyslexia 

Given that audiovisual integration deficits in dyslexia may be caused by anomalies in 
neural development, we examined whether the functional networks that differed 
between CH and AD were disrupted in children with dyslexia. Specifically, for each 
functional network with significant differences between CH and AD, a binary matrix 
was first generated from the functional network. Each element of the binary matrix 
was set to 1 if it corresponded to a nonzero element (i.e. an edge) in the functional 
network; otherwise it was set to 0. Then, we multiplied the corresponding elements of 
the binary matrix and the FC matrix of DD in the AVcon/AVincon condition created in 
CONN. Thus, we obtained a new FC matrix of the DD group that was masked by the 


binary matrix. The sum of all elements of the triangles above or below the diagonal of 
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the resulting matrix was taken as the network connectivity strength of the DD group. 
Finally, we performed a 2 (group: CH and AD) x 2 (audio-visual congruency: AVcon 
and AVincon) ANCOVA of the connectivity strength of the functional network. In 
addition to gender, Raven IQ scores were also entered as a covariate in the analysis, 
since there was a significant difference in Raven IQ scores between the CH and DD 
groups [t(28) = 2.16, p = 0.04]. The significance level was set at p < 0.05 after FDR 


correction for multiple comparisons. 


Results 

Behavioral performances 

In-scanner behavioral data for one CH and three AD were not recorded due to 
technical reasons. In addition, the data of another three AD were excluded because 
their response accuracy was too low ( < 0.73), and were shown to be outliers by the 
boxplot (Schwertman, Owens, & Adnan, 2004) in both the AVcon and AVincon 
conditions. Accordingly, the behavioral performance results were based on the 


remaining data of 17 CH and 15 AD. 


The average accuracy in the AVcon and AVincon conditions was 0.89 (SD = 0.11) and 
0.76 (SD = 0.19) for CH, and was 0.98 (SD = 0.03) and 0.95 (SD = 0.05) for AD, 
respectively. Wilcoxon signed rank tests showed that the accuracy in the AVcon 
condition was higher than that in the AVincon condition for both CH (p = 0.009) and 


AD (p = 0.029). Mann-Whitney tests showed that the accuracy of AD was higher than 
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that of CH in both AVcon (p < 0.001) and AVincon (p < 0.001) conditions. 


The average reaction time in the AVcon and AVincon conditions was 744.03 ms (SD = 
68.12 ms) and 799.61 ms (SD = 84.59 ms) for CH, and was 616.66 ms (SD = 76.13 
ms) and 662.50 ms (SD = 83.82 ms) for AD. A 2 (group: CH and AD) x 2 
(audio-visual congruency: AVcon and AVincon) ANCOVA with gender as a covariate 
revealed a significant main effect of group [F(1, 29) = 22.37, p < 0.001, partial 4?= 
0.44]. CH showed longer reaction time than AD. The main effect of audio-visual 
congruency was near-significant [F(1, 29) = 3.75, p = 0.063], but the interaction 
between group and audio-visual congruency [F(1, 29) = 0.33, p = 0.570] was not 


significant. 


NBS analysis results 

The NBS analysis revealed a large-scale functional network (173 nodes and 273 edges) 
for the congruency effect (AVcon > AVincon) in AD, mainly encompassing 
intra-regional connectivity within the prefrontal, occipital and limbic cortices, as well 
as inter-regional connectivity between the prefrontal and temporal cortices, between 
the prefrontal and parietal cortices and between the temporal and occipital cortices. 
The hubs included the left STG, the right MTG, the left lingual gyrus (LG), the left 
cuneus, the right middle occipital gyrus (MOG), the right supramarginal gyrus (SMG), 
the right insula, the right precuneus, the right CG and the right parahippocampal gyrus 


(PHG) (Figure 1). Additionally, a functional network (188 nodes and 280 edges) for 
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the incongruency effect (AVincon > AVcon) was detected in AD, mainly including the 
intra-regional connectivity within the prefrontal, occipital, parietal cortices and the 
motor strip, and inter-regional connectivity between the prefrontal and parietal 
cortices and between the parietal cortex and the motor strip. The hubs were the 
bilateral STG, the bilateral IPL, the bilateral postcentral gyrus (PostCG), the left 
precuneus, the left precentral gyrus (PreCG), the right medial frontal gyrus 


(MedialFG), the left insula and the bilateral CG (Figure 1). 


However, the contrast of AVcon > AVincon failed to identify a significant functional 
network in CH. The incongruency effect (AVincon > AVcon) was detected in a 
functional network (222 nodes and 368 edges) in CH primarily encompassing 
intra-regional connectivity within the prefrontal, occipital, parietal cortices and motor 
strip, and inter-regional connectivity between the prefrontal and limbic cortices. The 
hubs were the bilateral LG, the bilateral cuneus, the right FuG, the left MOG, the right 
superior/middle/inferior frontal gyrus (SFG/MFG/IFG), the bilateral inferior parietal 
gyrus (IPL), the left PreCG, the bilateral lentiform nucleus (LN) and the right insula 


(Figure 1). 
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Congruency effect in AD 


Prefrontal 
Motor strip 
Insula 
Parietal 
Temporal 
Occipital 
Limbic 
Cerebellum 
Subcortical 


z (ZAvcon = ZAVincon ) 
o a 0. 


Prefrontal 
Motor strip 
Insula 
Parietal 
Temporal 
Occipital 
Limbic 
Cerebellum 
Subcortical 


2 (Zavincon = Zaveon ) 
0 i 0.3 


Prefrontal 
Motor strip 
Insula 
Parietal 
Temporal 
Occipital 
Limbic 
Cerebellum 
Subcortical 


z (ZAvincon — ZAvcon ) 
0 M 0.3 


Figure 1 The functional networks underpinning congruency effect (AVcon >AVincon) 
or incongruency effect (AVincon >AVcon) of audiovisual integration in normally 
developing children and adults, respectively. The colors of the nodes in the brain plots 
indicate the lobe (coded by color bands along the matrix plots) to which they belong. 
The large nodes represent hubs, whose sizes are proportional to the node strengths. 
The matrix plots in the right panel represent connectivity strength between pairs of the 
9 brain lobes. Within each lobe, left hemisphere nodes are at the top (left) while right 
hemisphere nodes are at the bottom (right), separated by thin lines. The color of each 
element in the matrices represent the sum of the weight of all the edges for the 
connected lobes. CH = normally developing children, AD = adults. AVcon = 


audiovisually congruent characters, AVincon = audiovisually incongruent characters. 
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L = left, R = right. 


Critically, the 2 (group: CH and AD) x 2 (audio-visual congruency: AVcon and 
AVincon) repeated measures ANCOVA identified a significant interaction between 
group and audio-visual congruency in a functional network comprising 5 nodes and 3 
edges, which could be segmented into two subnetworks. The first subnetwork 
encompassed the left STG, the right MedialFG and the right SFG, forming a 
prefrontal-superior temporal functional network [interaction effect: F(1, 32) = 21.65, 
p < 0.001, partial 77 = 0.40]. The second subnetwork comprised the left thalamus and 
the right lentiform nucleus [interaction effect: F(1, 32) = 15.75, p < 0.001, partial 7°= 
0.33]. In both subnetworks, the AD group showed a significant congruency effect 
(AVcon > AVincon) [first subnetwork: F(1, 32) = 22.17, p < 0.001, partial 4? = 0.41; 
second subnetwork: F(1, 32) = 5.73, p = 0.023, partial 7?= 0.15], while CH showed 
an incongruency effect (AVcon < AVincon) [first subnetwork: F(1, 32) = 5.10, p = 
0.031, partial 77 = 0.14; second subnetwork: F(1, 32) = 12.60, p = 0.001, partial 77= 


0.28] (Figure 2A). 
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Figure 2 Functional networks underlying differences in audiovisual integration for 
reading between children and adults, and their correlations with reading ability. (A) 
Functional networks with significant interactions between group and audio-visual 
congruency. The color of the nodes indicates the lobes they belong to. (B) Correlation 
between the congruency effect (calculated by connectivity strength of the 
prefrontal-superior temporal network) and reading performance in normally 
developing children and adults. CH = normally developing children, AD = adults. 
AVcon = audiovisually congruent characters, AVincon = audiovisually incongruent 
characters. MedialFG = medial frontal gyrus, SFG = superior frontal gyrus, STG = 
superior temporal gyrus, LN = lentiform nucleus. L = left, R = right. + P < 0.1, * P< 


0.05, ** P < 0.01, *** P < 0.001. 
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Correlation analyses revealed that in CH, the congruency effect (calculated by 
connectivity strength of the prefrontal-superior temporal network) correlated with 
reading accuracy (r = 0.57, FDR-corrected p = 0.06) and phonological awareness (r = 
0.70, FDR-corrected p = 0.018), but not with reading fluency (r = 0.42, 
FDR-corrected p = 0.162). There were no significant correlations in the AD group 
(reading accuracy: r = -0.05, FDR-corrected p = 0.883; reading fluency: r = -0.20, 
FDR-corrected p = 0.646; phonological awareness: r = 0.56, FDR-corrected p = 0.116) 
(Figure 2B). A Spearman correlation analysis showed that in the AD group, the 
congruency effect (calculated by connectivity strength of the prefrontal-superior 
temporal network) was positively correlated with accuracy in the AVincon condition 
(r = 0.55, uncorrected p = 0.032), while in the group of CH, the correlation showed a 


similar trend, but was not significant (r = 0.39, uncorrected p = 0.125). 


Validation results 

The prefrontal-superior temporal network was repeated by the previous NBS 
procedure with a more stringent primary threshold (p < 0.005), and an additional 
estimation method based on the NBS extent with a primary threshold of p < 0.01. 
However, the analysis using a p < 0.005 threshold revealed no connectivity between 
the right SFG and the right MedialFG, and the analysis based on the NBS extent 
revealed no connectivity between the left thalamus and the right lentiform nucleus 
(Figure SIA). Using an alternative Craddock atlas, NBS analysis revealed a 


significant interaction between group and audio-visual congruency in a functional 
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network involving the left MFG, the right frontal pole, the bilateral planum 
temporale/STG and the left lateral occipital cortex [interaction effect: F(1, 32) = 11.76, 
p = 0.002, partial 7? = 0.27; AD showed a significant congruency effect: F(1, 32) 
=11.82, p = 0.002, partial n”? = 0.27; CH showed a near-significant incongruency 
effect: F(1, 32) = 2.88, p = 0.099, partial 77 = 0.08] (Figure S1B). The validation 
results highlighted the reproducibility of the identified prefrontal-superior temporal 
network underpinning the differences in audiovisual integration between children and 


adults. 


Results of functional network analysis in children with dyslexia 

Since only the prefrontal-superior temporal network was identified in validation 
analysis, we examined this brain network in dyslexic group. ANCOVA revealed a 
marginal significant interaction between group and audiovisual congruency [F(1, 26) 
= 3.06, p = 0.092, partial 77 = 0.11], but no significant main effects of group [F(1, 26) 
= 1.10, p = 0.305] or audiovisual congruency [F(1, 26) = 1.53, p = 0.227] were found. 
A simple effects analysis showed that there was a significant incongruency effect 
(AVincon > AVcon) in CH [F(1, 26) = 5.58, p = 0.026, partial 7?= 0.18], but not in 


DD [F(1, 26) = 0.20, p = 0.661] (Figure 3). 
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Figure 3 Connectivity strength of normally developing children and children with 
dyslexia in the prefrontal-superior temporal network. CH = normally developing 
children, DD = children with dyslexia. AVcon = audiovisually congruent characters, 
AVincon = audiovisually incongruent characters. L = left, R = right. * P < 0.05, ** P 


< 0.01, *** P < 0.001. n.s. = not significant. 


Discussion 

The present study aimed to explore the differences in functional brain networks in 
audiovisual integration for reading between children and adults. We found that during 
the lexical decision task, adults showed greater connectivity than children in a 
prefrontal-superior temporal network (encompassing the right medial frontal gyrus, 
the right superior frontal gyrus and the left superior temporal gyrus) and a 
thalamus-lentiform nucleus network (encompassing the left thalamus and the right 
lentiform nucleus), suggesting that these networks are associated with the 
development of audiovisual integration for reading. Moreover, the prefrontal-superior 


temporal network was found to be disrupted in children with dyslexia, thus 
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confirming its role in audiovisual integration for reading. Taken together, our findings, 
reveal, for the first time, the brain mechanisms of audiovisual integration for reading 
in adults and children, as part of multimodal information processing in higher 


cognition. 


Functional networks of audiovisual integration for reading 

Much research has identified areas associated with audiovisual integration such as 
STG, MTG, MFG, IFG, FuG, LG, cuneus, IPL, precuneus, insula and CG (Blau et al., 
2008; Erickson, Heeg, Rauschecker, & Turkeltaub, 2014; Hocking & Price, 2009; 
Holloway et al., 2015; Raij et al., 2000; van Atteveldt et al., 2010; van Atteveldt et al., 
2006; van Atteveldt et al., 2004; W. Xu et al., 2019). To our knowledge, the present 
study is the first to identify the functional brain networks of audiovisual integration 
for reading in both child and adult readers. Consistent with our predictions, 
audiovisual integration recruited a large-scale functional network in adults, involving 
intra-regional connectivity within the occipital cortex and inter-regional connectivity 
between the temporal and occipital cortices and between the prefrontal and temporal 
cortices. The left posterior STG was identified as the main hub, in line with previous 
findings showing its core role in audiovisual integration for both speech and 
non-speech stimuli (Erickson et al., 2014; Bethany Plakke & Romanski, 2019; van 
Atteveldt et al., 2004; Ye, Riisseler, Gerth, & Münte, 2017). When presented with 
incongruent audiovisual stimuli during a lexical decision task, both children and 


adults showed enhanced intra-regional connectivity within the prefrontal, occipital, 
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parietal regions, as well as the motor strip, consistent with evidence implicating 
frontoparietal and pre-supplementary motor areas in response inhibition (R. Zhang, 
Geng, & Lee, 2017). A previous neural framework for reading (Price & Devlin, 2011) 
has proposed that connectivity within the occipital cortex reflects bottom-up 
transmission of visual features to ventral occipitotemporal cortex (vOT, including 
FuG and LG), and the connectivity between the STG and vOT reflects top-down 
generation of predictions formed from prior experience. Moreover, top-down 
processing is modulated by higher order regions (prefrontal cortex) associated with 
attention and task demands (E. K. Miller & J. D. Cohen, 2001; Price & Devlin, 2011; 
Yoncheva, Zevin, Maurer, & McCandliss, 2010). In the current lexical decision task, 
the input of auditory speech sounds might strengthen the interaction between 
top-down and bottom-up hierarchies, which in turn affect the recognition of visual 
characters. However, we did not detect significant networks in the AVcon > AVincon 
contrast in normally developing children. Presumably, children might less sensitive to 
the audiovisual congruency effect during reading, given a lower level of linguistic 


knowledge and experience. 


Functional networks underlying the developmental changes in audiovisual 
integration for reading 

Network analysis revealed that in a prefrontal-superior temporal network, adults 
showed stronger connectivity in the AVcon condition than in the AVincon condition 


(congruency effect), while children showed a reverse pattern. The congruency effect 


27 


21 


22 


(calculated by the connectivity strength of the network) was positively correlated with 
reading accuracy and phonological awareness in children, but not in adults. This result 
suggests that this brain network is vital to reading skill, which is more pronounced in 
developing readers. Indeed, previous studies have highlighted the role of phonology 
in reading (Ho & Bryant, 1997; Karipidis et al., 2017; Melby-Lervag, Lyster, & 
Hulme, 2012), and additionally, reliance on phonological processing is greater in 
children than in adults (X. Liu et al., 2018). Two of the nodes identified in the 
network are the MedialFG and SFG and are located in the PFC (Carlen, 2017), which 
is part of the associative cortex of the frontal lobe (Calvert, 2001; Fuster, 1985). The 
PFC receives a wide array of sensory inputs from multiple modalities (Macaluso & 
Driver, 2005; MartA-nez-Sanchis, 2014; Bethany Plakke & Romanski, 2019; 
Sugihara, Diltz, Averbeck, & Romanski, 2006). In addition, the PFC has been 
hypothesized to support executive function, such as attention, inhibitory control and 
decision making (Carlen, 2017). Specifically, the bilateral MedialFG is generally 
involved in conflict detection (Aarts, Roelofs, & van Turennout, 2009; Bolger, 
Hornickel, Cone, Burman, & Booth, 2008; Doehrmann & Naumer, 2008; Noppeney, 
Ostwald, & Werner, 2010) and attentional control (Aarts et al., 2009; Bush, Luu, & 
Posner, 2000; Earl K. Miller & Jonathan D. Cohen, 2001). Moreover, the right SFG 
has been reported to subserve conflict resolution mechanisms by utilizing top-down 
control of attentional resources (Corbetta & Shulman, 2002; Muller et al., 2011; R. 
Zhang et al., 2017). Besides Medial FG and SFG, another critical node that was 


shown to be part of the network is the left anterior STG. This finding is consistent 
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with previous studies reporting the engagement of the anterior superior temporal 
cortex in letter-speech sound integration (Hocking & Price, 2009; van Atteveldt et al., 
2006). Furthermore, the anterior STG has been identified as a semantic hub enabling 
activation of semantic representations, irrespective of the input modality, such as 
written words, auditory sounds and pictures (Lambon Ralph, Sage, Jones, & 
Mayberry, 2010; Visser & Lambon Ralph, 2011). During reading, semantic, 
phonological and orthographic representations are automatically accessed, even when 
semantic processing is not required (Perfetti & Tan, 1998; Y. Xu, Pollatsek, & Potter, 
1999; S. L. Zhang, Perfetti, & Yang, 1999), and this is especially true in Chinese, 
which has direct mappings between orthography and semantics (Y. Liu & Perfetti, 
2003). Thus in our lexical decision task, both phonological and semantic 
representations are activated (Specht et al., 2003; Yates, Locker, & Simpson, 2003), 
but given that semantic skills undergo prolonged development throughout childhood 
and adolescence (Moore-Parks et al., 2010; Vannest, Karunanayaka, Schmithorst, 
Szaflarski, & Holland, 2009), adults might automatically activate semantic 
representations, whereas children might tend to activate phonological representations, 
which is reflected in differences in connectivity in the anterior STG between children 


and adults. 


There is anatomical evidence of structural projections from the STG to the PFC in 
nonhuman primates (Petrides & Pandya, 2002; B. Plakke & Romanski, 2016) as well 


as in humans (Garell et al., 2012). The uncinate fasciculus and the arcuate fasciculus 
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are two main fiber tracts that link the STG with the PFC, forming the anatomical 
substrates for the transmission of auditory or multisensory information (Bethany 
Plakke & Romanski, 2019). Functionally, the connectivity between the STG and PFC 
has been linked to attention shifting (Pammer, Hansen, Holliday, & Cornelissen, 2006; 
Paraskevopoulos et al., 2015) and behavioral responses to visual stimuli (Donner et al., 
2007). Attention is a vital factor for the late development of audiovisual integration 
skill (Burr & Gori, 2012; Dick et al., 2010), and it has been found to mediate 
audiovisual integration processing at multiple stages (including visual and auditory 
processing, spatiotemporal realignment, congruency matching and semantic analysis) 
in both bottom-up and top-down fashion (Koelewijn, Bronkhorst, & Theeuwes, 2010; 
Navarra, Alsius, Soto-Faraco, & Spence, 2010; Talsma et al., 2010). According to the 
framework of the multifaceted interplay between multisensory integration and 
attention (Talsma et al., 2010), audiovisual integration tends to occur pre-attentively 
(i.e., bottom-up) when speech inputs are congruent with visual characters, which in 
turn enhances the perceptual processing of task-relevant modality. In contrast, when 
speech input is in conflict with visual character input, top-down attentional 
mechanisms are required to inhibit the task-irrelevant stimuli that act as attention 
capturing distractors. Evidence of such top-down mechanisms comes from the 
positive correlation between the prefrontal-superior temporal network and the 
accuracy in the AVincon condition. Consequently, compared with developing children, 
the maturation of the prefrontal-superior temporal network may allow skilled adult 


readers to take more advantage of congruent phonology and suppress more 


30 


21 


22 


incongruent interference for the recognition of visually presented characters. 


In addition, the differences in audiovisual integration between children and adults 
were also found in a thalamus-lentiform nucleus network (the connectivity between 
the left thalamus and the right lentiform nucleus). The thalamus is an interface 
through which nearly all sensory information must pass it before reaching the cerebral 
cortex (Mccormick & Bal, 1994). The lentiform nucleus is a part of the striatum that 
receives massive projections from thalamus (Russchen & Jonker, 1988). Previous 
studies have demonstrated that the fronto-striato-thalamic pathway was associated 
with inhibitory capacity, which develops from childhood to adulthood (Rubia, Smith, 
Taylor, & Brammer, 2007). Therefore, the subcortical thalamus-lentiform nucleus 
network may support the inhibition of interference caused by incongruent auditory 
speech sounds. However, the result was not replicated in the validation procedure, and 


therefore need further verification. 


The disruption of the prefrontal-superior temporal network in developmental 
dyslexia 

To further confirm whether the prefrontal-superior temporal networks is a key neural 
circuit for audiovisual integration specific to reading, we examined this networks in a 
group of dyslexic children with impaired reading skills. In line with our expectation, 
we observed disruption of the prefrontal-superior temporal network in dyslexic 


children. This result is accordance with a previous evidence of abnormalities in a 
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functional network composed of STG/STS and medial prefrontal cortex in German 
dyslexics in an audiovisual speech integration task (Ye et al., 2017). Since the 
prefrontal-superior temporal network is involved in cross-modal attention shifts 
(Pammer et al., 2006; Paraskevopoulos et al., 2015), its disruption potentially signals 
difficulties in shifting attention between modalities in children with dyslexia, a 
phenomenon known as “sluggish attentional shifting” (Hari & Renvall, 2002; Harrar 
et al., 2014). As a result, Children with dyslexia may focus their limited attention 
resources on task-related visual stimuli (characters) and be less affected 
(facilitated/inhibited) by auditory input. In all, the lack of a congruency/incongruency 
effect in children with dyslexia revealed an atypical development of the 
prefrontal-superior temporal network. The present findings further elucidate the 
neural foundations of developmental dyslexia and thus have the potential to inform 
assessment and intervention programs in developmental dyslexia (Hillock-Dunn & 


Wallace, 2012; Schlaggar et al., 2002). 


Limitations. 

Some limitations should be considered in light of the present findings. First, we 
employed a visual lexical decision task to examine the effect of simultaneously 
presenting speech sounds and visual characters in a realistic reading context 
(Knoop-van Campen, Segers, & Verhoeven, 2020). However, further research is 
required in order to test whether our findings apply in the context of auditory 


perception, that is, the extent to which presenting visual information affects auditory 
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perception during the audiovisual integration processing. 


Second, our findings apply to the Chinese writing system, which is non-transparent 
and drastically different from alphabetic languages regarding in both the visual 
features of the written scripts and in its orthography-to-phonology correspondences. 
As previous studies have shown that audiovisual integration depends on the extent of 
orthographic transparency (Holloway et al., 2015), further work is necessary to 
examine the generalizability of the present results to transparent/semitransparent 


writing systems. 


Third, although the effective sample size of the present study meets the requirement 
for acceptable statistical power, future studies with a larger sample of participants are 
needed to obtain more reliable and repeatable results. In addition, further studies 
might consider recruiting participants from a wider age range and use a longitudinal 
design to assess changes in the networks of audiovisual integration for reading along 


the neurodevelopment trajectory. 


Conclusion 

The present study revealed the differences in a prefrontal-superior temporal network 
that is involved in audiovisual integration for reading between normal developing 
children and skilled adult readers. The findings presumably reflect the effect of 


attention modulation in audiovisual integration. In addition, the network identified 
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was disrupted in a group of children with developmental dyslexia, thus highlighting 
its importance in reading. We argue that the present study is the first to unveil the 
neural mechanisms of audiovisual integration for reading in children and adults, 
potentially reflecting neurodevelopmental changes due to the development of reading 
skills, and advancing our understanding of neural correlates of multimodal sensory 


integration in humans. 
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